Using XML for Long-term Preservation Experiences from the DiVA Project
نویسندگان
چکیده
One of the objectives of the DiVA project is to explore the possibility of using XML as a format for long-term preservation. For this reason, the practical use of XML in different parts of the SYSTEM was evaluated before deciding on the design. The DiVA Document Format defined by an XML schema has been developed to describe the inter-relationships amongst the various data elements and processes, and to support long-term preservation of the actual documents. XML Schema provides a means for defining the structure, content and semantics of XML documents. It is an XML based alternative to the XML Document Type Definition (DTD). Because one of the primary reasons for using XML was to support long-term preservation, the most popular DTDs for documents: DocBook and TEI were evaluated. Limitations regarding metadata descriptions were found in both of these DTDs, so the decision to develop a new structure for DiVA, using XML schema, was made. This schema combines the DocBook Schema (derived from the DocBook DTD) for the textual parts of the document with the internal schema for all metadata (bibliographic and administrative data). Using the DiVA Document Format for content management and inter-process communication, several applications were developed. Some of their purposes are essential for long-term preservation: • Make persistent National Bibliographic Numbers (NBN) available for the URN resolution service1 at the Royal Library in Stockholm available. • Send MARC21 records in MARC-XML to the National Library. • Create archival file packages for long-term preservation, checksum them, store them in the DiVA Archive and send a copy of them to the Swedish Royal Library. Currently the file-archives for long-term preservation contain the original full-text file in various formats and the DiVA Document Format file, which contains all the metadata about the document. Furthermore the DiVA Document Format file contains all parts of the full-text file that can be converted into XML. In the future it might be possible to transfer the whole full-text into XML, in which case the file-archives would contain only DiVA Document Format files. Preface DiVA Digitala vetenskapliga arkivet (DiVA Archive) is a comprehensive description of a searchable archive containing all documents, which are published in an electronic form at Uppsala University in Sweden. Other Swedish universities are also co-operating in the project within the DiVA framework. One part of this archive is the database containing theses published at Uppsala University from 1998 to date. In September 2000 an Electronic Publishing Centre was established at Uppsala University Library. Its primary assignment was a project in which technical solutions, and a well-functioning workflow, for electronic posting and full-text publication of doctoral theses, essays, working papers and other types of scientific publications were to be created. The first phase of the project was completed in 2002 and the result was the DiVA Publishing System a SYSTEM for electronic publishing of different types of publications. One of the goals has been to create a long-term archive containing all digital documents published at Uppsala University. The assignment involves both technical and organisational issues. Developer team faced with many questions. How can the loss of data be avoided? What kind of descriptive and administrative metadata is useful for archiving? What is the appropriate metadata format for long time preservation? How important is the layout of the objects and how is it to be handled? How can images and formulas be handled? Because of those questions, XML was discussed early on as a format for storing descriptive and administrative metadata, as well as for the complete content of the doc1 http://urn.kb.se/resolve 110 Using XML for Long-term Preservation Long Term Preservation uments. XML represents a format that is easy to restore and understand by both humans and machines. This paper will describe the current status of the XML implementation in DiVA Archive and the surrounding applications and why XML is an important format for longterm preservation. XML as Long-term Preservation
منابع مشابه
Archiving Workflow between a Local Repository and the National Archive Experiences from the DiVA Project
DiVA – Digitala vetenskapliga arkivet (DiVA Archive) – is a comprehensive description of a searchable archive containing the documents, which are published in an electronic format at Uppsala University in Sweden. The DiVA System, developed by the Electronic Publishing Centre at Uppsala University Library, makes it possible to reuse and enhance data originally entered by the author as the basis ...
متن کاملPaper : Cost Model for Digital Preservation
Digital Preservation Testbed was a practical research project with the overall goal of investigating options to secure sustained accessibility to authentic archival records over the long-term, by carrying out experiments in a controlled and secure environment. This allowed the project to ascertain the effects of undertaken preservation action on different archival records. Testbed researched th...
متن کاملTowards Digital Government by XML Standardization: Methods and Experiences
The paper describes alternative ways for the use of XML in public administration and gives examples of the use in Finland. A challenge for getting XML into use is in the XML standardization. XML standardization levels and types in public administration are introduced. The RASKE methodology developed in long-term collaboration of the Finnish Parliament and ministries with the University of Jyväs...
متن کاملE-theses and the Nordic E-theses Initiative. The Impact of the Joint Work on the Role of the Library
The concept of collaborative development and cooperation within libraries when it comes to the development of technical solutions, policies and practices supporting electronic publishing is becoming more common today. There are numbers of possible approaches how to proceed. I will share the experience based on working with other libraries within a consortium. My intention is to contribute to un...
متن کاملThe DiVA System: Current Status and Ongoing Development
The DiVA system, originally developed at Uppsala University, has evolved out of a scholarly repository and publishing system solely used by one university into a system used and further developed in collaboration with seventeen universities in Sweden, Norway and Denmark. The system consists of several modules which are entirely built on XML and Java technologies. This modularisation facilitates...
متن کامل